Part-of-speech Tagging in French Te Experiments in Tagset

نویسنده

  • Hongyan Jing
چکیده

Part-of-speech tagging is needed for French Text-to-Speech (TTS) synthesis to disambiguate the pronunciation of homograph heterophones, liaison instances, and eventually to model intonational contours. A core problem in the part-of-speech tagging in French TTS is to decide on the tagset used for the tagger and the tagset needed by TTS. We carried out a number of experiments on several sizes of tagsets as well as on several algorithms to investigate this problem. Our experiment results suggest that there may be an optimal tagset to be used for the part-of-speech disambiguation in French TTS. This optimal tagset contains a slightly larger number of tags than the tagset that is needed by TTS for pronunciation disambiguation and intonational modeling purposes. In our experiments, the optimal tagset gives a 98.4% tagging accuracy for TTS, when a trigram Hidden Markov Model tagger is used.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Internal and external tagsets in part-of-speech tagging

We present an approach to statistical partof-speech tagging that uses two di erent tagsets, one for its internal and one for its external representation. The internal tagset is used in the underlying Markov model, while the external tagset constitutes the output of the tagger. The internal tagset can be modi ed and optimized to increase tagging accuracy (with respect to the external tagset). We...

متن کامل

Training and Evaluation of POS Taggers on the French MULTITAG Corpus

The explicit introduction of morphosyntactic information into statistical machine translation approaches is receiving an important focus of attention. The current freely available Part of Speech (POS) taggers for the French language are based on a limited tagset which does not account for some flectional particularities. Moreover, there is a lack of a unified framework of training and evaluatio...

متن کامل

Linguistic Issues in Grace (Evaluation of Part-of-Speech Tagging for French)

GRACE is the first large-scale evaluation program of taggers for French. This experiment allowed to compare the assignments of Parts-of-Speech tags by various different taggers, on a common corpus of literary and journalistic texts. The evaluation relied on the acceptance by the participants of a reference formalism for morpho-syntactic description (the reference tagset) used by an expert to ta...

متن کامل

Semantic Role Labelling with minimal resources: Experiments with French

This paper describes a series of French semantic role labelling experiments which show that a small set of manually annotated training data is superior to a much larger set containing semantic role labels which have been projected from a source language via word alignment. Using universal part-of-speech tags and dependencies makes little difference over the original fine-grained tagset and depe...

متن کامل

STTS 2.0? Improving the Tagset for the Part-of-Speech-Tagging of German Spoken Data

Part-of-speech tagging (POS-tagging) of spoken data requires different means of annotation than POS-tagging of written and edited texts. In order to capture the features of German spoken language, a distinct tagset is needed to respond to the kinds of elements which only occur in speech. In order to create such a coherent tagset the most prominent phenomena of spoken language need to be analyze...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002